Incorporation of a Valency Lexicon into a TectoMT Pipeline
نویسندگان
چکیده
In this paper, we focus on the incorporation of a valency lexicon into TectoMT system for Czech-Russian language pair. We demonstrate valency errors in MT output and describe how the introduction of a lexicon influenced the translation results. Though there was no impact on BLEU score, the manual inspection of concrete cases showed some improvement.
منابع مشابه
Czechizator - Čechizátor
We present a lexicon-less rule-based machine translation system from English to Czech, based on a very limited amount of transformation rules. Its core is a novel translation module, implemented as a component of the TectoMT translation system, and depends massively on the extensive pipeline of linguistic preprocessing and postprocessing within TectoMT. Its scope is naturally limited, but for s...
متن کاملFrom PropBank to EngValLex: Adapting the PropBank-Lexicon to the Valency Theory of the Functional Generative Description
EngValLex is the name of an FGD-compliant valency lexicon of English verbs, built from the PropBank-Lexicon and following the structure of Vallex, the FGD-based lexicon of Czech verbs. EngValLex is interlinked with the PropBank-Lexicon, thus preserving the original links between the PropBank-Lexicon and the PropBank-Corpus. Therefore it is also supposed to be part of corpus annotation. This pap...
متن کاملBuilding the PDT-Vallex valency lexicon
In our contribution, we relate the development of a richly annotated corpus and a computational valency lexicon. Our valency lexicon, called PDT-Vallex (Hajič et al., 2003) has been created as a “byproduct” of the annotation of the Prague Dependency Treebank (PDT) but it became an important resource for further linguistic research as well as for computational processing of the Czech language. W...
متن کاملAnnotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation
We present a formalization of the valency theory (Panevová, 1974) that fits the stratificational representation scheme used in the Prague Dependency Treebank. The notion of a lexicon as a repository of “static” (invariable, or context-independent) source of information is formally presented; a different type of lexicon is used at every layer of sentence representation, with a formal link to thi...
متن کاملValency Lexicon of Czech Verbs: Towards Formal Description of Valency and Its Modeling in an Electronic Language Resource
Valency refers to the capacity of verb (or a word belonging to another part of speech) to take a specific number and type of syntactically dependent language units. Valency information is thus related to particular lexemes and as such it is necessary to describe valency characteristics for separate lexemes in the form of lexicon entries. A valency lexicon is indispensable for any complex Natura...
متن کامل